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GENERALIZED  PACKAGES  FOR  ANALYSIS  OF 
VARIANCE  AND  CATEGORICAL  DATA 

y 

This  paper  groups  (a)  analysis  of  variance  and  (b)  categorical 
data  problems  into  several  classes  and  then  describes  general  software 
packages  that  can  analyze  all  classes  of  problems  that  have  been  de¬ 
fined.  The  strengths  and  weaknesses  of  a  variety  of  software  packages 
are  compared  in  tentis  of  the  classes  of  problems  they  can  handle  and 
the  ease  with  which  they  can  be  used.  A  method  for  analyzing  unbalanced 
split-plot  designs  with  currently  available  software  is  described. 

ANALYSIS  OF  VARIANCE 


Unbalanced  Designs 

Psychologists  doing  field  research  often  have  unequal  sample  sizes 
in  different  cells  in  analysis  of  variance  designs.  When  unequal  sample 
sizes  exist,  the  design  is  considered  unbalanced.  There  are  some  statis¬ 
tical  complications  associated  with  the  least-squares  analysis  of  unbal¬ 
anced  designs,  and  appropriate  software  is  more  difficult  to  find. 

Statistical  complications  with  unbalanced  designs  include  several 
threats  to  validity  of  results.  With  balanced  designs,  the  sums  of 
squares  that  go  into  the  numerators  of  the  F  ratios  for  each  term  in 
the  model  are  independent.  Independence  does  not  exist  in  the  term  in 
the  denominator,  since  the  common  sums  of  squares  for  error  in  the  de¬ 
nominator  are  used  to  test  a  variety  of  terms  in  the  model.  With  un¬ 
balanced  designs,  however,  the  term  in  neither  the  numerator  nor  the 
denominator  of  an  F  ratio  is  independent,  which  may  create  increased 
problems  of  Type  I  error,  particularly  in  the  case  where  many  F  tests 
are  made  on  a  large  number  of  terms  in  an  unbalanced  design.  With 
split-plot  analysis  of  variance  designs  (where  one  or  more  factors  are 
repeated-measures  factors),  F  tests  are  approximate  rather  than  exact, 
particularly  when  unbalance  exists.  The  expected  mean  square  coeffi¬ 
cient  for  a  term  in  the  numerator  of  an  F  ratio  will  not  be  exactly 
the  same  as  the  coefficient  for  the  same  term  in  the  denominator,  in 
unbalanced  split-plot  designs,  and  the  difference  becomes  larger  as 
the  unbalance  becomes  greater. 

Another  complication  arises  in  the  way  the  variance  is  partitioned 
with  unbalanced  designs.  The  researcher  has  the  option  of  partitioning 
the  variance  in  a  variety  of  ways  depending  on  his  objectives.  When 
unbalance  exists,  confounding  between  different  terms  in  the  mcxiel  also 
exists.  In  other  words,  the  expected  mean  squares  for  each  term  in  the 
model  contain  a  variety  of  extraneous  components.  These  extraneous 
components  include  some  or  all  of  the  expected  mean  square  components 
for  terms  that  come  after  the  term  of  interest  in  the  model  statement. 
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The  researcher's  objective  may  be  to  construct  F  ratios  that  have  the 
same  expected  moan  square  components  enterinq  into  the  mean  squares  as 
those  found  in  the  analoqous  balanced  analysis  of  variance  desiqn — in 
other  words,  to  unconfound  expected  mean  squares  for  all  terms  in  the 
mtidel.  In  a  fixed-effect  factorial  desiqn,  this  would  mean  adjusting 
each  term  for  all  the  other  terms  in  the  model  (by  ordering  each  term 
of  interest  last  in  the  model  statement).  When  this  is  done,  the  sums 
of  squares  for  all  terms  in  the  model,  when  added  together,  are  always 
less  than  the  total  sums  of  squares  for  the  design.  This  approach  in¬ 
volves  assigning  only  that  portion  of  the  variance  that  is  unconfounded 
to  each  term  in  the  model  and  eliminating  the  confounded  portion  of  the 
variance . 

The  researcher  may  not  wish  to  partition  the  variance  so  that  the 
expected  mean  squares  are  unconfounded  by  extraneous  components.  In¬ 
stead,  the  researcher  may  wish  to  partition  tlie  variance  in  a  hierarchi¬ 
cal  manner,  so  that  the  sums  of  squares  for  each  term  in  the  model,  when 
added,  equal  the  total  sums  of  squares.  This  type  of  partitioning  is 
done  when  the  researcher  is  willing  to  assume  for  theoretical  or  prac¬ 
tical  reasons  that  somt'  terms  take  precedence  over  others,  e.g.,  main 
effects  over  interactions.  When  these  assumptions  are  made,  the  expected 
mean  squares  for  the  terms  that  take  precedence  over  others  are  still 
confounded  with  extraneous  expected  mean  square  components.  However, 
making  the  assumption  that  some  variables  take  precedence  over  others, 
and  then  partitioning  the  variance  in  a  hierarchical  manner  consistent 
with  these  assumptions,  is  equivalent  to  taking  that  portion  of  the 
variance  which  is  confounded  and  assuming  that  the  confounded  variance 
is  due  to  the  variable  that  takes  precedence  rather  than  to  the  varia¬ 
bles  that  are  confounded  with  it .  This  approach  is  a  method  of  assign¬ 
ing  un con founded  variance  to  the  appropriate  terms  in  the  model,  and 
then  assigning  that  portion  of  the  variance  that  is  confound  to  one 
term  rather  than  to  other  terms  that  are  confounded  with  it,  by  making 
the  assumption  of  precedence. 


Multivariate  Analysis  of  Variance 

Psychologists  doing  field  research  face  not  only  unbalanced  de¬ 
signs  but  also  designs  with  multiple  dependent  variables.  A  univariate 
analysis  of  variance  is  often  computed  for  each  of  a  large  number  of 
dependent  variables,  which  creates  the  problem  of  inflation  of  lype  I 
error.  Wlien  multiple  univariate  F  tests  are  made,  some  will  be  signifi¬ 
cant  by  chance  alone.  A  problem  with  interpreting  results  arises  when 
significance  is  found  with  univariate  F  tests  at  a  level  not  much  be¬ 
yond  what  might  be  expected  on  the  basis  of  chance. 

Multivariate  analysis  of  variance  controls  for  this  type  of  in¬ 
flation  of  Type  I  error.  Multivariate  analysis  of  variance  reduces 
each  subject's  scores  on  each  of  the  dependent  variables  to  one  number, 
a  number  that  is  a  simple  linear  combination  of  the  subject's  scores  on 
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each  of  the  original  dependent  variables.  Multivariate  analysis  of 
variance  consists  of  a  search  for  the  linear  combination  of  dependent 
variables  that  discriminates  best  between  levels  of  an  independent 
variable  in  the  sense  of  producing  the  largest  possible  univariate  F 
ratio  (Harris,  1975) .  Significance  for  this  largest  possible  F  ratio 
is  determined  by  a  critical  value  that  is  appropriate  for  it,  one  that 
ta)ces  into  account  the  extreme  capitalization  on  chance  that  was  made 
in  arriving  at  it.  The  original  linear  combination  of  dependent  vari¬ 
ables  that  discriminates  best  between  levels  of  an  independent  variable 
is  the  same  as  the  primary  discriminant  function  found  witli  discrimi¬ 
nant  analysis. 

Since  multivariate  test  statistics  are  based  on  a  linear  combina¬ 
tion  of  dependent  variables,  there  is  no  necessary  one-to-one  relation¬ 
ship  between  univariate  F  ratios  and  the  multivariate  test  statistic. 

In  other  words,  it  is  possible  to  have  significant  univariate  F  ratios 
but  not  to  have  significance  with  the  multivariate  test  statistic,  or 
the  reverse — with  a  significant  multivariate  test  statistic  and  no  sig¬ 
nificant  univariate  F  ratios.  It  is  informative,  however,  if  the  re¬ 
searcher  finds  significance  with  the  univariate  F  ratios  but  not  with 
tlie  multivariate  test  statistic.  When  this  is  the  case,  the  researcher 
can  best  interpret  the  significant  univariate  F  ratios  as  due  to  chance, 
i.e.,  the  inflated  Type  I  error  that  occurred  when  multiple  F  tests  were 
made . 

The  nonsignificant  multivariate  test  statistic  indicates  that  it 
was  not  possible  to  find  a  linear  combination  of  dependent  variables 
that  could  produce  a  significant  univariate  F.  In  many  cases,  psycholo¬ 
gists  may  wish  to  run  multivariate  analyses  of  variance  to  find  out 
whether  or  not  the  significance  that  is  found  with  multiple  univariate 
analyses  is  real,  i.e.,  whether  it  is  due  to  inflated  Type  I  error. 

Each  univariate  analysis  of  variance  design  has  a  multivarivate 
analogue.  As  mentioned  previously,  it  is  difficult  to  find  appropri¬ 
ate  software  that  can  handle  unbalanced  analysis  of  variance  designs 
and  even  more  difficult  for  unbalanced  multivariate  analysis  of  vari¬ 
ance  designs.  However,  progrcuns  are  available  that  can  handle  these 
designs . 


Random  and  Mixed  Effects 

As  mentioned  previously,  analysis  of  variance  designs  can  be  clas¬ 
sified  as  (a)  balanced  or  unbalanced  and  (b)  univariate  or  multivariate. 
Other  classifications  of  analysis  of  variance  designs  are  also  impor¬ 
tant:  (c)  the  classification  of  the  design  as  a  random,  fixed,  or 

mixed  model;  and  (d)  the  classification  of  the  design  as  one  with 
repeated-measures  factors  or  one  without  such  factors. 
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In  the  random  model,  the  levels  of  the  independent  variables  are 
randiwdy  selected,  and  researcher  wishes  to  generalize  his  results  to 
all  levels  of  the  independent  variable  within  the  population  of  interest. 

In  the  fixed  model,  the  levels  selected  exhaust  the  ixipulation  of 
interest,  and  the  researcher  wishes  only  to  getteralize  to  the  selected 
levels  that  have  been  fixed.  In  the  mixed  nuxlel,  some  independent  vari¬ 
ables  are  fixed  effects,  aitd  some  random  effects. 

The  classification  of  a  design  as  a  randcxn,  fixevl,  or  mixed  mi>.iel 
affects  the  selection  of  the  approi->r iat e  error  term.  With  the  random 
or  mixed  model,  the  expected  mean  squares  for  some  or  .ill  terms  in  the 
itKxlol  contain  additional  components  that  are  not  found  in  the  fixed 
motlel.  The  appropri.ite  error  teim  must  also  contain  the  appropriate 
sidditional  ciimixinonts  in  its  expecttnl  me.iii  square,  to  test  the  signifi- 
cai\ce  of  a  term  th.it  contains  these  additional  comixments.  With  the 
r.indom  or  mixed  moiiel  it  is  sometimes  necessary  to  construct  error  terms 
that  contain  the  appropriate  expected  mean  squares,  including  the  ap¬ 
propriate  additional  comixinents,  by  doing  arithmetic  on  the  mean  squares 
of  selected  terms  in  the  nxidol  .ind  then  making  a  quasi-K  test  with  the 
constructed  error  term.  It  is  .ilso  commoti  to  find  that  with  randon<  or 
mixed  models,  error  terms  with  appropri.it e  expected  mean  squares  exist 
but  contain  too  few  degrees  of  freedom  to  make  tests  that  are  at  all 
powerful . 


Split-Plot  Designs 

The  final  classification  of  analyses  of  variance  designs  made  heie 
is  betwt’en  designs  that  contain  repeated-me.isures  f. actors  and  those  that 
do  not.  Designs  that  h.ive  some  f. actors  with  repeated  me.isures  and  some 
factors  without  rofie.iled  measures  .ire  often  c.illed  sv'lit-plot  vlesigns. 

In  these  designs,  multiple  mt'asurements  are  m.ide  for  the  s.ime  subject 
or  unit  of  analysis.  These  designs  create  a  factor  for  between  subjects 
(or  units)  differences,  and  the  variance  for  this  term  is  oxtracttxl 
from  the  error  variance  used  to  tost  the  repeated-measures  terms.  Krror 
variance  is  reduced  in  this  manner,  but  so  are  the  degrees  of  freedom 
for  the  error  term(s)  that  are  vised  to  test  the  reivated  terms.  These 
mcniels  also  require  more  restrictive  assumptions  of  constant  correlation 
between  responses  at  all  levels  of  e.ich  repeated-measures  f.ictor. 


Software  Comparisons 

Analysis  of  variiinco  software  i>ackages  can  be  comp.ired  in  terms  of 
their  qenorality--the  extent  to  which  they  h.indle  the  c.itegories  of  vle- 
signs  that  wore  described  previously:  (a)  balanced,  unbalanced:  (b)  uni¬ 
variate,  multivariate:  (c)  random,  fixed:  and  (d)  with  vir  without  re- 
f'oated  measures  (split-plot  or  not). 
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The  program  has  only  three  preestablished  ways  of  partitioning  the 
variance:  (a)  the  "classic"  option,  in  which  main  effects  are  adjusted 

for  main  effects  only  and  interactions  are  adjusted  for  main  effects 
cind  interactions;  (b)  the  "regression"  option,  in  which  each  term  is 
adjusted  for  all  other  terms;  and  (c)  the  "hierarchical"  option  of  ad¬ 
justing  each  term  only  for  the  terms  preceding  it.  With  unbalanced 
designs,  the  regression  option  produce.'  mean  squares  that  are  uncon¬ 
founded  by  eliminating  the  confounded  variance  in  the  manner  described 
previously.  Unfortunately,  there  is  a  programing  error  with  this  option 
in  release  06,  which  means  it  cannot  be  used.  A  multivariate  analysis 
of  variance  program  is  planned  for  SPSS,  release  07.  The  OSIRIS  package 
contains  a  multivariate  analysis  of  variance  program,  MANOVA,  which  can 
handle  for  the  fixed  model  univariate  or  multivariate  analyses  of  vari¬ 
ance  for  balanced  or  unbalanced  designs.  This  program  cannot  handle 
split-plot  designs. 


The  most  general  analysis  of  variance  programs  that  the  author  is 
aware  of  are  (a)  the  MAD/RUMMAGE  program  (Bruce  &  Carter,  1974)  and 
(b)  the  MULTIVARIANCE  program  (Finn,  1974).^  MAD  is  the  original  and 
RUMMAGE  is  the  updated  version  of  the  same  generalized  analysis  of  vari¬ 
ance  program.  RUMMAGE  is  updated  and  improved  on  a  continuing  basis. 
Both  MAD/RUMMAGE  and  MULTIVARIANCE  can  analyze  any  crossed  or  nested 
design,  including  split-plot  designs,  that  are  either  balanced  or  un¬ 
balanced,  for  either  univariate  or  multivariate  analyses.  These  are 
the  only  programs  that  the  author  is  aware  of  that  can  handle  unbalanced 


The  MAD/RUMMAGE  package  is  available  from  Dr.  Gale  Bryce,  Department 
of  Statistics,  204  TMCB,  Brigham  Young  University,  Provo,  Utah  84602, 
phone  (801)  374-1211,  extension  4505.  Implementation  is  on  IBM  360/370 
as  well  as  several  non-IBM  systems.  Army  Research  Institute,  Presidio 
of  Monterey  Field  Unit,  has  a  copy  of  MAD.  The  MULTIVARIANCE  package 
is  available  from  International  Educational  Services,  P.O.  Box  A3650, 
Chicago,  Illinois  60690,  phone  (312)  684-4920.  Implementation  is  on 
IBM  360/370,  CDC  6000  series,  UNIVAC  1108. 


split-plot  dosiqns.  Both  proqr.ims  c.tn  ilo  univari.tte  .tiid  mu  It  »vmj  i.it  t* 
cov.tr ianco  .tna lys is  .ts  well. 

Pesiijn  St.ttoimuUs.  Both  tl>e  MAP/RUMhtAoP  .tiul  the  MUl.TIVARIANi'K 
pro<ir.tms  have  their  owji  streuqths  and  weaknesses  in  jvirticul.ir  .tre.ts, 
but  together  provide  a  v^'Wt'rful  jvtck.tge  that  can  handle  practically  any 
.inalysis  of  vari.mce  problem.  The  strengths  and  weaknesses  of  each  pro- 
gr.im  can  be  c<->mpared.  ‘'ne  strength  of  the  MAP/RUNlMAiJK  progr.tm  is  the 
simplicity  of  the  mixlel  st.ttement  that  defines  the  .tnalysis  of  v.ii  iance 
design.  The  user  enters  m«,xlel  statements  in  a  form  f.tmiliar  to  most 
users,  e.g.  , 


Y(IJ)  «  A(I)  ♦  B{J)  ♦  AB(IJ)  +  K. 

If  the  user  is  not  f.imiliar  with  entering  moviel  statements  in  this  fonn, 
he  or  she  c.in  quickly  le.irn  to  write  an  analysis  of  v.iri.ince  m^xiel  st.ite- 
ment  for  any  design  by  learning  .»  few  simple  rules.  For  example,  crossed 
.ind  nesting  relationships  are  indic.itod  by  the  subscripts  it\  parentheses. 
A  nestevi  rel.it  ionship  exists  when  tliere  .ire  more  subsciipts  in  jxiienthe- 
ses  .issoci.ited  with  .1  given  term  t  h.in  there  are  letters  .issociated  with 
th.it  term.  Tlie  "extra"  subscripts  define  those  terms  that  the  term  ot 
inteiest  is  nested  within.  The  user  c.in  quickly  decide  which  interac- 
t  ions  should  Ih'  incUhled  aiul  whicli  ones  should  be  excluded,  when  writing 
the  ciimplete  full-r.ink  miviel  for  .iny  design,  by  referring  to  the  follow- 
inij  rule:  Inter.ictions  between  .my  two  terms  in  the  nuvlel  should  Ix'  in¬ 
cluded,  if  the  subscripts  tliat  are  in  parentheses  for  the  twv  teims  that 
at  e  Ix'ing  considerevl  for  inter.iction  do  not  cont.iin  .my  subscripts  that 
.ire  the  s.ime.  Interactions  do  not  enter  the  m^viel  if  twv->  terms  contain 
v'omiTKin  subscripts  in  vxi  rent  lie  ses. 

In  contrast ,  the  MULTIVAR’^ANv'E  V'rogi.im  requires  the  user  to  vlefine 
the  .in.ilysis  of  v.iri.ince  mcivlel  by  entering  design  m.itrices.  It  is  more 
difficult  to  write  correct  design  m.itrii.'os,  particularly  with  designs 
that  incluvlo  nesting  and  liigh  order  interactions,  and  more  difficult  to 
enter  the  multiple  c.irds  for  these  matrices  into  the  vuivn’am,  t h.m  to 
write  a  single  mtxlel  statement  as  required  by  MAP/RUMMAC.K. 

Kxpoct ed  Mean  ■‘^qu.ires .  A  second  strength  of  the  MAP /’RUMMAdK.  pro¬ 
gram  is  that  it  provides  a  matrix  shv^wing  the  expect evl  me.m  squares  foi 
each  tom  in  the  mixiel.  This  is  particularly  useful  when  analycing  ran¬ 
dom  or  mixed  model  .in.ilysis  of  v.iri.ince  designs.  In  the  b.ilanced  v'.ise, 
the  user  can  immediately  identify  the  appropriate  error  terns  and  iden¬ 
tify  how  to  construct  error  terms  if  appropriate  ones  do  not  exist. 
MPt.TIVARlANCK  does  not  provide  expected  mean  squares,  so  in  this  c.ise 
the  user  would  need  to  calculate  them  himself  to  find  the  av^propriat  e 
error  terns. 


The  expected  mean  square  output  from  MAD/RUMMAGE  helps  the  re¬ 
searcher  identify  confounding  in  unbalanced  analyses  of  variance  de¬ 
signs.  The  researcher's  objective  may  be  to  arrive  at  unconfounded 
sums  of  squares  that  have  the  same  expected  mean  squares  as  those  found 
in  an  analogous  balanced  analysis  of  variance  design.  If  the  researcher 
is  not  sure  iibout  the  way  in  which  the  terms  in  the  model  are  confounded 
in  an  unbalanced  design,  the  expected  meeui  squares  output  from  MAD/ 
RUMMAGE  will  give  the  information,  and  the  researcher  can  then  order 
and  reorder  terms  in  the  model  to  eliminate  the  confounding.  In  those 
cases  where  the  researcher  partitions  the  variance  hierarchically,  the 
expected  mean  squares  output  provides  the  researcher  with  information 
about  the  nature  of  the  assumptions  being  made.  The  researcher  is  as¬ 
signing  confounded  variance  to  one  term  rather  than  to  others  by  assump¬ 
tion,  and  by  identifying  confounded  terms,  this  assumption  becomes  ex¬ 
plicit.  The  researcher  can  examine  confounded  terms  to  see  if  it  is 
reasonable  to  assume  that  one  confounded  term  takes  precedence  over  the 
others. 


Expected  mean  squares  are  also  useful  in  identifying  the  confound¬ 
ing  that  occurs  with  incomplete  block  designs  (i.e.,  designs  in  which 
there  are  missing  cells) .  When  there  are  missing  cells  (no  observations 
in  one  or  more  cells)  ,  the  resultant  analysis  requires  the  researcher  to 
assume  that  particular  interactions  are  zero  in  order  to  (a)  eliminate 
confounding  and  (b)  estimate  parameters  for  all  terms  in  the  model. 

These  assumptions  are  similar  to  assuming  that  interactions  are  zero  in 
a  Latin  square  design.  The  estimated  mean  squares  output  will  tell  the 
researcher  which  interaction  must  be  assumed  to  be  zero  in  order  to 
eliminate  confounding  and  make  estimates  for  all  terms.  MULTIVARIANCE 
will  also  identify  confounded  effects  in  the  case  of  incomplete  block 
designs. 


The  expected  mean  square  output  may  also  identify  confounding  where 
the  researcher  does  not  expect  it.  For  example,  adding  covariates  to  a 
balanced  analysis  of  variance  design  will  produce  confounded  expected 
mean  squares  for  terms  in  the  model,  and  the  researcher  may  wish  to  ad¬ 
just  for  this  confounding. 


Unbalanced  Split-Plots 


Both  MAD/RUMMAGE  and  MULTIVARIANCE  are 
the  only  progrcims  known  to  the  author  that  can  handle  unbalanced  split 
plot  designs.  Unfortunately,  a  problem  arises  with  both  programs  in 
analyzing  these  designs:  When  the  model  statement  for  the  analysis  is 
written  in  the  customary  manner,  the  core  storage  required  by  these  de 
signs  becomes  extremely  large,  exceeding  the  capacity  of  nearly  all 
computer  installations.  Only  designs  based  on  very  small  seimple  sizes 
can  be  processed  in  the  customary  manner.  A  procedure  for  getting 
around  the  core  space  problem  with  the  MAD/RUMMAGE  progrcun  is  given 
later  in  this  paper.  Future  updates  of  the  RUMMAGE  program  will  prob¬ 
ably  incorporate  this  procedure  as  an  automatic  part  of  the  output  for 
split-plot  designs.  This  would  be  desirable,  since  it  is  impractical 


to  obtain  multivariate  tests  for  split-plot  desiqns  with  larqe  sample 
siaes  as  the  prcKjrvim  is  written  now.  MULTIVAKIANCE  provides  a  method 
of  qettinq  around  the  core  space  problem  by  transforming  the  raw  data, 
callinq  for  multivariate  tests,  and  then  pickinq  up  selected  statistics 
friMn  the  multivariate  output.  The  details  of  this  statistical  procedure 
have  Ixjen  vlescribed  by  Boek  (1975).  a  second  run  is  required  to  qet  the 
correct  means,  since  a  transformation  was  made  on  the  raw  data  in  the 
i'Utial  run.  One  problem  with  this  procedure  is  that  the  output  for 
t ..e  split-plot  desiqns  obtained  in  this  way  is  not  labeled  correctly. 

A  separate  run  can  be  made  on  another  proqram  like  MAD/RUMMAGE  or  BMP 
to  correctly  identify  statistics  in  the  MULTIVARIANCE  output.  MULTI¬ 
VARIANCE  IS  the  only  proqram  that  can  currently  perfoim  multivar iate 
tests  for  unbalanced,  split-plot  desiqns  that  are  based  on  larqe  sample 
sizes. 


Data  Management.  The  model  statement  for  MAU/RUMMAGE  is  easy  to 
write.  The  control  card  ci.immatul  structure  for  RUMMAGE  has  been  qreatly 
simplified  from  the  oriqinal  MAD  version  of  the  proqram.  However,  the 
proqram  (a)  is  poorly  documented,  (b)  has  riqid  requirements  for  the 
form  that  the  input  data  must  bt*  in  to  be  accepted  by  the  proqram,  and 
(c)  has  no  missinq  data  option.  Tlie  independent  variables  must  be  num¬ 
bered  consecutively  from  one  to  the  number  of  levels  of  the  variable 
and  must  be  in  sorted  order.  These  requirements  mean  that  the  user  with 
a  larqe  data  file  must  enter  a  proqram  like  SPSS  to  recode  vaiiables,  if 
necessary,  and  eliminate  missinq  data,  pass  this  data  on  a  temporary 
scratch  file  to  a  utility  proqram  that  con  sort  the  independent  varia¬ 
bles,  and  then  pass  this  file  to  t.ie  final  job  step  where  the  analysis 
is  made  by  MAD/RUMMAGE.  This  can  be  accomplished  in  one  run  but  is  in¬ 
convenient  for  the  user.  MULTIVARIANCE  provides  a  user  supplied  sub¬ 
routine  for  missinq  data,  and  concatenation  with  SPSS  for  data  selection, 
lecoiiinq,  etc.  MULTIVARIANCE  provides  a  variety  of  options  for  inputtinq 
the  data  into  the  proqram.  In  qeneral,  MULTIVARIANCE  is  considerably 
more  convenient  than  MAD/RUMMAGE  for  the  user  with  larqe  data  files 
makinq  multiple  analyses  with  the  same  or  similar  desiqns.  The  MAD/ 
RUMMA(.:e  proqram  provides  useful  information  about  confoundinq  and  alx’^ut 
hypotliesis  testinq  with  random  and  mixed  movlels. 


Discriminant  Analyses.  MULTIVARIANCE  provides  a  wider  variety  of 
multivariate  statistics  than  MAD/RUMMAGE  includinq  discriminant  analyses 
and  canonical  correlation.  It  is  offer*  useful  to  follow  up  siqnificant 
multivariate  analyses  of  variance  tests  with  discriminant  analyses  to 
identify  the  particular  dependent  variables  that  were  influenced  most 
by  a  qiven  independent  variable.  MULTIVARIANCE  can  provide  discriminant 
analyses  for  any  term  in  any  multivariate  analyses  of  variance  desiqn. 
Discriminant  analyses  are  not  available  with  MAD/RUMMAGE.  RUMMAGE  will, 
however,  provide  analyses  of  cateqorical  data,  as  described  in  the  Cate- 
qorical  Data  section  of  this  paper  for  loq-linear  mcxiels. 


A  METHOD  FOR  ANALYSIS  OF  SPLIT- PLOTS 


One  of  the  chief  limitations  of  the  MAD/RUMMAGE  Program,  as  it  is 
currently  written,  is  its  ined>ility  to  process  split-plot  designs  based 
on  a  large  scunple  size.  Even  moderately  sized  samples  very  quickly  ex¬ 
ceed  the  core  limitations  of  most  computer  centers.  This  problem  is 
unique  to  split-plot  designs.  Other  designs,  like  factorial  designs, 
can  readily  be  processed  even  with  very  large  saunple  sizes.  A  procedure 
for  getting  around  the  core  space  problem  with  MAD/RUMMAGE  is  given  in 
this  section  of  the  paper. 

The  problem  arises  with  split-plot  designs  because  they  have  more 
than  one  "error"  term.  These  designs  include  one  whole-plot  error  term 
that  is  used  to  test  the  significance  of  between  subjects  or  plots 
(nonrepeated-measures)  terms,  and  one  or  more  split-plot  error  terms 
that  are  used  to  test  the  significance  of  the  repeated-measures  or 
split-plot  term(s)  and  interactions  with  these  term(s).  The  whole-plot 
error  term  consists  of  a  random  subjects  or  plots  term  nested  within 
the  between  subjects  or  plots  (nonrepeated)  terms,  while  the  split-plot 
error  term(s)  consists  of  the  interactions  between  each  repieated-measures 
(split-plot)  term  and  the  whole-plot  error  term.  The  model  statement  in 
the  current  MAD/RUMMAGE  program  allows  a  person  to  include  any  number  of 
"error"  terms  in  the  model  statement;  however,  only  the  last  of  these 
error  terms  does  not  add  to  the  core  space  required  by  the  computer. 

Each  error  term  except  the  last  one  adds  a  dramatic  amount  to  the  re¬ 
quired  core  space. 

A  method  for  analyzing  these  split-plot  designs,  suggested  by 
Hendrix  (1975) ,  involves  dividing  the  problem  between  the  MAD/RUMMAGE 
program  and  another  program  that  can  handle  balanced  rejseated-measures 
designs.  This  approach  has  several  disadvantages:  (a)  It  requires 
writing  model  statements  that  are  unique  to  the  dividing  procedure; 

(b)  it  requires  a  fair  amount  of  hand  calculation  (subtraction) ;  (c)  it 
requires  two  computer  programs;  and  (d)  it  cannot  handle  multivariate 
cinalysis  of  variance. 


Split-Plot  Example 

A  different  method  for  analyzing  these  split-plot  designs,  which 
has  sc»ne  of  the  previous  limitations  but  can  be  handled  within  the 
MAD/RUMMAGE  program  alone,  is  shown  below.  The  complete  full-rank  model 
of  a  split-plot  design  can  lae  written  as  follows  for  the  MAD/RUMMAGE 
program: 


Y(IJKL)  *  T{I)  +  S(J)  +  TS(IJ)  +  C(IJK)  +  R(L)  +  TR(IL) 
+  SR(JL)  +  TSR(IJL)  +  CR(IJKL)  +  E. 


(1) 
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In  this  desitjn,  the  terms  T  and  S  are  between  subjects  or  plots  (non- 
rej-ieated)  terms  and  R  is  the  repeated-measures  or  split-plot  term.  As 
the  nwxiel  is  written  above,  C  is  the  whole-plot  error  term  and  is  used 
to  test  the  significance  of  the  terms  which  precede  it  in  the  model 
and  CR  is  the  split-plot  error  term.  The  term  E,  as  written  above, 
contains  no  degrees  of  freedom  and  serves  only  to  terminate  the  model. 

The  C  and  CR  terms  above  require  a  great  deal  of  core  storage.  The 
.HAU/RUMMAOE  program  is  written  so  that  the  E  term  terminates  the  model 
and  also  collects  the  sums  of  squares  due  to  any  terms  that  are  deleted 
from  the  complete  full-remk  model.  This  being  the  case,  it  is  possible 
to  immediately  delete  the  CR  term  from  the  model  (as  it  is  written  above) 
and  let  the  sums  of  squares  for  this  term  be  collected  by  the  E  term. 
However,  the  C  term  will  still  make  the  problem  exceed  storage  capacity 
for  all  but  the  smallest  samples.  Both  the  C  and  the  CR  terms  can  be  de¬ 
leted  from  the  model  as  follows: 


+  TSR(IJL)  +  E 


In  this  case,  the  E  term  contains  the  sums  of  squares  for  both  the  C 
and  CR  terms.  The  sums  of  squares,  degrees  of  freedom,  estimated  means 
etc.,  for  all  other  terms  in  the  model  besides  E  are  correct.  The  prob¬ 
lem  now  becomes  one  of  separating  the  sums  of  squares  for  the  C  and  CR 
terms  that  are  confounded  within  the  E  term. 


A  separate  run  can  be  made  on  MAD  to  obtain  the  correct  sums  of 
squares  for  the  C  term,  and  then  the  correct  sums  of  squares  for  the 
CR  term  can  be  obtained  by  subtraction  from  the  E  term  listed  in  (2) 
above.  To  be  specific,  the  individual  responses  or  scores  can  be  summed 
across  all  levels  of  the  repeated  factor  R,  and  these  sums  can  be  run 
with  a  MAD/RUMMAGE  model  that  includes  the  between-subjects  or  plots 
(nonrepeated)  terms,  T  and  S,  and  excludes  the  repeated-measures  term  R: 


The  sums  of  squares  for  the  E  term  in  (3)  above  are  equivalent  to  the 
sums  of  squares  for  the  C  (whole-plot  error)  term  in  (1),  after  the  E 
term  in  (3)  has  been  divided  by  the  number  of  levels  of  the  repeated 
factor  R.  The  sums  of  squares  for  the  whole-plot  error  (nonrepeated) 
error  term  are  thus  obtained  by  dividing  the  E  in  (3)  by  the  number  of 
levels  of  the  repeated  factor  R,  and  the  sums  of  squares  for  CR  are  ob¬ 
tained  by  subtracting  the  whole-plot  error  term  from  the  E  in  (2) .  The 
number  of  degrees  of  freedom  for  the  whole-plot  error  term  as  obtained 
in  (3)  is  correct,  and  the  number  of  degrees  of  freedom  for  the  CR  term 
is  obtained  by  subtracting  the  number  of  degrees  of  freedom  for  the 
whole-plot  error  term  from  the  number  of  degrees  of  freedom  given  for 
the  E  term  in  (2)  . 


General  Split-Plot  Procedure 


The  above  approach  can  be  generalized  to  any  split-plot  analysis 
of  variance  design  as  follows: 

1.  Any  split-plot  design  can  be  analyzed,  but  each  design  will 
require  as  many  separate  runs  with  different  model  statements  as  there 
are  error  terms  in  the  model. 

2.  The  first  run  should  include  all  terms  in  the  complete  full- 
rank  model  except  for  the  error  terms,  which  should  be  deleted.  This 
run  will  produce  the  correct  sums  of  squares  and  degrees  of  freedom  for 
all  terms  included  except  for  the  E  term,  which  will  contain  the  sum  of 
the  sums  of  squares  and  degrees  of  freedom  for, all  error  terms  in  the 
model . 


3.  The  whole-plot  error  term  can  be  obtained  in  a  separate  run  by 
summing  individual  scores  across  all  levels  of  each  repeated-measures 
factor  in  the  model.  If  there  is  more  than  one  repeated-measures  factor, 
these  scores  should  be  summed  over  all  levels  of  all  of  these  factors. 
These  sums  are  then  run  on  MAD/RUMMAGE  using  a  model  statement  that  in¬ 
cludes  the  terms  tested  by  the  whole-plot  error  and  excludes  the  terms 
tested  by  the  split-plot  error (s).  The  sums  of  squares  for  the  E  term 
of  this  model  is  divided  by  the  sum  of  the  number  of  levels  of  the 
repeated-measures  factor (s)  in  the  model. 

4.  When  there  is  more  than  one  split-plot  error  term,  one  run 
with  a  distinct  model  statement  is  required  for  each  split-plot  error 
term  in  the  model,  except  for  the  one  that  is  entered  last. 

a.  The  first  split-plot  error  term  is  obtained  by  summing 
individual  scores  across  all  levels  of  the  repeated-measures  factor (s) 
except  for  the  repeated-measures  factor  that  enters  into  the  error  term 
being  obtained.  A  hypothetical  repeated  measures  factor  B,  for  example, 
should  be  tested  by  the  B  x  subjects  interaction,  so  in  this  case  in¬ 
dividual  scores  should  be  summed  across  all  levels  of  repeated-measures 
factors  that  happen  to  be  in  the  model  except  for  B.  A  run  is  than 
made  on  the  MAD/RUMMAGE  with  a  model  that  includes  all  terms  tested  by 
the  whole-plot  error  and  all  terms  tested  by  the  B  x  subjects  interac¬ 
tion.  All  terms  tested  by  all  other  split-plot  error  terms  that  are 
in  the  model  are  excluded.  All  error  terms  except  for  the  final  E 
should,  of  course,  also  be  excluded  from  the  model  statement,  so  that 
in  this  case  the  E  collects  the  sums  of  squares  for  the  whole-plot  error 
plus  the  B  X  subjects  interaction.  The  sums  of  squares  for  the  E  term 
resulting  from  this  run  should  be  divided  by  the  sum  of  the  levels  of 
repeated-measures  factors  that  are  in  the  model  besides  the  B  factor. 

The  correct  sums  of  squares  for  the  B  x  subjects  interaction  can  be 
obtained  by  subtracting  the  whole-plot  error  term  from  the  E  term  ob¬ 
tained  in  this  run  that  has  been  divided  by  the  number  of  levels  of 
repeated  factors  as  given  above. 
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b.  The  aev'onil  split -plot  ei»i>i  \s  obtAiite<l  in  the  same  mani\ei 
as  the  first,  by  (al  summing  imlividvial  siMves  ao»osB  levels  of  the  »e- 
peateil  faotojs  that  do  not  enter  into  the  erivir  te«m  of  interest, 

(b)  analyzing  these  Bv\ms  with  a  MAO/RI>MMA»;k  mo»lel  that  inolvules  the 
teims  tested  by  the  whole-plot  emn  as  well  as  the  tetms  testeil  by 
the  split-plot  error  of  interest,  (c)  dividing  the  tes\iltatU  sums  ot 
sqviares  foi  the  K  tenn  by  the  nvimber  of  levels  of  i  epeat  evl  faotots  that 
went  into  the  initial  avtms,  and  (d)  taking  this  testilt  and  snbtraotino 
the  wh»>le-plot  error  ftvw  it. 

c.  The  final  split -plot  erior  tetm  v'ait  be  «^'taii\ed  by  snb- 
1 1  act  ing  each  of  the  previvMtsly  obtained  ertor  teims  from  the  K  teim 
that  was  obtained  in  the  first  run.  IVgiees  of  fieedom  for  this  final 
s\'lit-plvit  teim  shoviUl  alsv>  be  vibtained  by  subtracting  the  degrees 
frei'dvMii  fvir  previvnisly  obtained  ei  ivn  teims  fivwi  the  degrees  of  fieedvsn 
for  the  F  term  (.ibtainevl  in  the  fiist  rnn.  The  K  teim  fiimi  this  ftist 
run  collect  eil  the  sums  of  sgviares  and  ilegiees  »if  fieevUmi  foi  all  ei  t  oi 

t  e  rms . 


Idmi  t  at^Mis 

The  V'lev'eding  approav'h  will  only  w\m  k  when  balaiue  exists  acii'ss 
all  levels  v'f  each  of  the  repeat  ed-measni  es  factv'is.  In  ot  hei  woivls, 
there  needs  to  be  one  observation  pet  cell  av'ioss  all  levels  of  ea»'h 
repeat  e»l-measnres  factor.  If  missing  vlata  exist  at  one  level  of  a  i  e- 
I'eatevl  factvii  ,  the  data  foi  all  levels  vif  the  repeate>l  factoi  need  t 
In'  reitiv'ved.  However,  the  v"t*vivMis  av'pi  v>ach  is  appivipiiate  when  unbal¬ 
ance  (unequal  cell  sizes)  exists  fv>i  the  nonrev'eat  ed ,  bet  ween-subiect  s 
factors. 

A  one-way  repeated  measures  deslgti  cannot  be  divi>led  intv'  t\co  tuns 
vMi  the  MAP,  RHMMAvlF  program.  The  "whole-plot"  ei  i  vii  canm't  be  obtainevi 
with  a  seivirate  run,  since  there  is  no  teim  in  the  mvxlel  test  evl  by 
"whv'le-plot  "  eiivir.  Hv'wever,  the  initial  MAP/Rl'MMAilR  lun  can  be  mavle, 
vleletimt  the  raiivU'm  sublects  factor  that  is  usevl  as  a  bU'v'king  factoi. 
The  error  teim  will  then  incliule  the  sublects  x  repeat  evl- fav't  or  ei  i  v'l 
teim  plus  the  lanvUmi  sublects  factvM  .  The  i  auvlvwi  sublev'ts  factv'i  v'ouUl 
be  calculatevi  with  a  sevvirate  Poitran  routine.  IKiwever,  one-way  i  e- 
V'eatevl  measuies  vlesigns  can  reavlily  be  haiivllevl  with  v'ther  piv'giams  in 
the  univariate  ease.  Unfv'itunately ,  most  v'ther  progr.ams  v'annv't  hanvlle 
the  multivariate  case,  auvl  the  v'ore  space  problem  may  piv'hibit  the  mul¬ 
tivariate  case  fixMii  being  run  with  the  MAP/RlIMMAvlK  proviiam. 

The  precevling  appi  v'ach  is  ixv't  prav't  iv'al  fv'i  mult.ivai  iat  e  analysis 
of  variance  with  any  split-plot  vlesigu.  With  multivariate  analysis  v'f 
variance  there  is  a  sums  of  squares  aiul  v'ross  piv'vluv't  s  mail  lx  assv'cialevl 
with  each  teim  in  the  mvxlel.  The  test  statlstiv'  in  multivariate  analy¬ 
sis  of  variaiu'e  is  the  vleterminant  v'f  the  eriv'i  matilx  vlivlvled  by  the 
vleterminant  of  the  sum  of  the  eriv'i  matilx  plus  the  matrix  fv'i  the  teim 


beinq  tested.  The  determincints  are  in  error  when  using  the  preceding 
subtraction  approach  with  multivariate  analysis  of  variance,  since  the 
determinant  of  the  difference  between  matrices  is  not  equal  to  the 
difference  between  the  determinants  of  two  matrices.  To  obtain  the 
correct  determinants,  the  matrix  for  the  whole-plot  error  wtiuld  have 
to  be  subtracted  from  the  matrices  for  the  other  terms  in  the  model 
and  determinants  calculated  for  these  differences.  Although  the  ap¬ 
propriate  matrices  can  be  obtained  from  the  MAD/RUMMAGE  package,  the 
amount  of  calculation  required  to  subtract  matrices  and  calculate  de¬ 
terminants  obviously  exceeds  what  is  practical  to  do  by  hand. 


CATEGORICAL  DATA 

Psychologists  collect  data  that  are  generally  measured  on  nominal 
or  ordinal  scales.  Results  are  often  expressed  in  the  form  of  frequency 
tabulations  in  one-way,  two-way,  and  multiway  tables.  Such  data  are 
often  analyzed  with  the  traditional  Pearson  chi-square  statistic  as 
applied  repeatedly  to  different  subsets  of  the  total  possible  number  of 
two-dimensiotial  tables.  Army  researchers  often  have  occasion  to  mea- 
svire  such  nominal  variables  as  race,  sox,  MOS,  mission  tyv>e  (combat, 
support) ,  etc.  Categorical  data  of  tliis  nature  are  frequently  analyzed 
by  the  repeated  application  of  the  Pearson  chi-square  statistic  to  all 
possible  combinations  of  two-way  tables,  using  the  SPSS  Crosstabs  pro¬ 
cedure.  Even  ordinal  data,  including  ordinal  questionnaire  responses, 
are  often  expressed  in  terms  of  the  percentages  of  subjects  who  selected 
particular  responses,  particularly  since  data  presented  in  this  way  are 
easily  interpreted  by  nonresearchers  in  terms  of  the  original  scales. 
However,  it  is  often  difficult,  and  in  some  cases  impossible,  for  a 
researcher  to  test  the  hypotheses  of  interest  in  terms  of  two-way  con¬ 
tingency  tables.  In  many  cases  the  researcher  runs  multiple  tests  in 
order  to  test  hypotheses  that  have  been  stated  in  a  fragmentary  form. 

The  use  of  linear  regression  mcniels  for  the  analyses  of  multidimen¬ 
sional  categorical  tables  has  been  described  by  Grizzle,  Starmer,  and 
Koch  (1969) .  This  approach  provides  a  comprehensive  method  for  the 
statistical  analysis  of  qualitative  data  that  is  directly  analogous  in 
scope  and  power  to  multiple  regression  and  multivariate  analysis  of 
variance  as  applied  to  quantitative  data  (Koch  s  Roinfurt,  1970).  This 
approach  provides  a  better  method  for  testing  many  hypotheses  than  the 
repeated  application  of  the  Pearson  chi-square.  Appilieations  of  tills 
methodology  are  beginning  to  appear  in  the  social  science  literature 
(see  Giles,  Gatlin,  &  Cataldo,  1976).  This  least  squares  approach  to 
the  analysis  of  categorical  data  has  been  programed  and  is  available  as 
a  Fortran  program  called  GENCAT  (Landis,  Stanish,  Freeman,  S  Koch,  1976) 


2 

This  program  has  been  implemented  at  IBM  360/370  installations.  It 
will  shortly  be  modified  to  be  compatible  with  non-IBM  machines.  The 
program  is  available  from  Dr.  Richard  Landis,  Dept,  of  Biostat ist Ics , 
University  of  Michigan,  Ann  Arbor,  Michigan  4fll09.  Army  Research  In¬ 
stitute,  Presidio  of  Monterey  Field  Unit,  has  a  copy  of  this  program. 


13 


Defining  Categorical  Pat  a  _Mode la 

Dust  as  with  analysis  of  variance,  the  type  of  analysis  that  is 
made  with  categorical  data  depends  on  the  specifications  of  the  utiiier- 
lying  model.  The  underlying  moiiel  depends  on  the  sampling  plan  of  the 
exiieriment.  The  first  step  in  specifying  the  model  for  ♦he  analysis 
is  to  distinguish  between  (a)  the  variables  that  measure  the  experi¬ 
mental  conditions  or  subgroups  to  which  subjects  belong  and  (b)  the 
variables  that  measure  what  subsequently  happens  to  subjects.  All  pos¬ 
sible  combinations  of  levels  of  the  variables  measuring  experimental 
conditions  or  subgroups  define  the  "populations"  or  "factors,"  in  the 
design  and  the  possible  cixnliinat  ions  of  variables  measuring  what  h.ap- 
pens  to  subjects  define  the  "resptinses. "  A  table  of  proportions  de¬ 
fined  by  the  numlier  of  response  combinations,  by  the  number  of  i^ipula- 
tions,  is  entered  into  GF.NCAT. 

Multidimensional  contingency  tables  are  entered  into  GFNCAT  accoril- 
Inq  to  how  the  model  has  been  defined  in  terms  of  popvilations  and  re¬ 
sponses.  Several  general  types  of  models  can  be  identified*. 

1.  No  factor,  multiresponsej 

2.  Unifactor,  multiresponse; 

J.  Multi  factor,  uni  response ;  and 

4.  Multifactor,  mult i response . 

Only  Models  1  and  2  can  occur  with  two-way  tables,  and  Models  1  throiigh 
3  for  three-way  tables;  otherwise  all  models  can  occur. 

Mode 1  1 .  The  questions  that  are  asked  in  the  case  of  the  no  fac¬ 
tor,  mult  i  resixmse  mo*iel  are  analogous  to  qviestions  that  would  be  asked 
in  repeated-measures  analysis  of  variance  designs  where  all  the  factors 
(one  or  more)  in  the  design  are  repeated-measures  factors.  A  problem 
of  interobser ver  agreement  coulil  also  fit  under  Miviel  1.  Since  all  ob¬ 
servers  rate  the  same  persoii/situation ,  the  ratings  fit  in  the  mi'ld  of 
a  repeated-measures  analysis  of  variance  design.  However,  in  this  case 
hypotheses  of  interest  would  Include  not  only  tests  of  the  differences 
between  proportions  but  also  agreement  hypotheses;  Is  agreemei;t  iliffer- 
ent  from  that  expected  by  chance  alone? 

Mode 1  2 .  In  the  case  of  uni  factor,  mult  I  response  tables,  the 
questions  asked  are  analogous  to  those  asked  with  one-way  multivariate 
analysis  of  variance  designs.  In  designs  of  this  nature,  the  researchet 
is  Interested  in  the  association  among  dependent  or  response  variables 
as  well  as  the  Influence  of  the  Independent  or  factor  variable  v->n  the 
response  variables.  With  one  factor  and  a  series  of  r  resvHMtse  cat e- 
gories,  questions  asked  Include  (a)  the  influence  of  the  factor  o;t  the 
marginal  distribution  of  the  response,  and  (b)  the  Influence  of  the 
factor  on  the  joint  distribution  of  the  r  resixxise  categories. 


Mcxiel  3.  In  the  case  of  multifactor,  uniresp>onse  tables,  the  de¬ 
sign  is  directly  analogous  to  factorial  analysis  of  variance  designs. 
Here  the  researcher  wishes  to  determine  how  the  factors  or  independent 
variables  combine  to  produce  the  response  or  dependent  variable.  The 
researcher  can  test  for  "main  effects"  for  the  factors  and  for  "inter¬ 
action  effects,"  just  as  in  analysis  of  variance,  except  in  this  case 
the  researcher  is  looking  at  differences  between  proportions  instead 
of  means.  An  excunple  of  multifactor,  uniresponse  problems,  may  be  in¬ 
structive  at  this  point.  Table  1  presents  a  hypothetical  factor  by 
response  matrix  of  proportions  in  a  form  that  could  be  entered  into 
OENCAT.  Both  columns  of  proportions  are  entered  into  the  program,  but 
a  transformation  matrix  is  entered  to  eliminate  the  second  column,  be¬ 
cause  (a)  we  are  only  interested  in  comparing  proportions  who  received 
Article  15's,  and  (b)  computations  cannot  be  made  when  singularity 
exists  (i.e.,  when  the  rows  add  up  to  1.0).  Singularity  also  exists 
when  a  proportion  in  the  table  is  zero.  When  a  zero  enters  into  the 
table,  the  levels  of  the  factors  must  either  be  collapsed  to  eliminate 
the  zero,  or  else  the  zero  must  be  replaced  by  a  small  proportion  to 
eliminate  the  singularity.  The  GENCAT  output  for  Table  1  would  include 
one  chi-square  statistic  testing  significance  for  the  main  effect  of 
Race,  one  for  Rank,  and  one  for  the  Race  x  Rank  interaction. 


Table  1 

Example  of  Multifactor,  Uniresponse  Problem 


Proportion  Proportion  not 


Race 

Rank 

receivinq 

AR-15 

receivinq  AR-15 

Black 

Enlisted 

.  30 

.70 

Black 

Officer 

.00 

1.00 

White 

Enlisted 

.20 

.80 

White 

Officer 

.02 

.98 

Note.  Race  and  Rank  define  the 

factors  or 

populations ; 

the  response 

is  defined  by  receiving  or  not  receiving  Article  15  punishment. 


The  GENCAT  results  can  be  briefly  cr«iipared  to  traditional  results. 
Separate  Pearson  chi-square  statistics  could  have  been  readily  computed 
in  Tcible  1  for  the  effects  of  Race  and  Rcuik,  but  not  for  the  interaction 
between  these  factors.  With  the  GENCAT  approach,  each  term  in  the  model 
is  adjusted  for  the  other  terms,  in  a  manner  analogous  to  least  squares 
analysis  of  variance  or  multiple  regression — which  would  not  have  been 
the  case  had  two  Pearson  chi-square  statistics  been  computed.  Also, 
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whon  mult irasponae  mcxiels  «re  «ntor«d,  GRNCAT  autniMtica.lly  adiusta 
fur  tha  corralation  b«tw«en  th«  multipla  raaponaes.  Tha  adjuatmant  ia 
mado  by  dafininq  the  approprlata  variablaa  aa  "raaponaa"  inataad  of 
"factor"  varlablea.  Traditional  taxta  hava  alwaya  notad  that  tha  ap¬ 
propriate  chi-aqviara  teat  atatiatic  dependad  on  whethar  tha  variablaa 
waie  corralatad  or  not  (Farquaon,  1966) .  Fortunataly,  adjuatraant  for 
1.XU relation  can  ba  made  by  tha  appropriate  aelaction  of  tha  modal  for 
oENGAT . 


Movie  1  4.  Finally,  multifactor,  multiraaponaa  modals  are  analogviua 
to  factorial,  multivariate  analyaia  of  variance  daaiqns,  or  to  aplit- 
pli>t  analyaea  of  varlaiuTO  deaiqna.  In  thaaa  daaiqna,  viuaationa  are 
askovi  abv>vjt  relationahiv>a  amonq  tl»e  dejMinviant  or  rapaatad-meaaurea  varl 
ables  as  well  aa  the  way  factors  cvsmbina  to  affect  the  rcaponses. 


An  Example — Ti\e  ARI  Representatiot^  Iiwiex 

Appropriate  teat  statist  lea  are  vierlvad  by  entorinq  a  aeviuonce  of 
l  ransformat  Ivin  vleslqn  aiui  vxmtraat  matrices  into  OBNCAT.  These  matrices 
viperate  on  the  vector  of  prv>|H>rt  Ivma  in  such  a  manner  as  to  vieflno  the 
specific  cvintiasts  of  iivterest.  Einear,  Unjarithmic  (Iviq^),  and  exixi- 
nent  ial  t  ransfvirmat  ions  i>f  the  prviport ivins  are  ^xiasible,  ami  these  trans 
format ivms  affect  t)\e  nature  of  the  hypvithesea  that  are  tested.  The 
follviwiiuj  ARI  reseaiv?)\  ex.ample  shviws  hviw  t  rans  format  iv>ns  affect  the 
n.itvu'e  of  t  )u’  Ivyixit Iveses  that  are  testovi. 

As  one  apv'iviaclv  to  ivlv>ntityii\vj  possible  areas  of  institutional 
vlisci  iminat ivin  in  th«’  Army,  Nvirville,  Tluim.as,  atvd  Sevlll.a  (1975)  c;on- 
structed  a  Representation  tiuiex-J  as  a  quantitive  measure  of  how  ptvxno- 
tivins,  punlslvments ,  eviucat, iv>n ,  etc.,  Ivave  been  viiatributevi  amvinvi  wlvites 
and  nonwhites.  This  Representation  Ituiex  is  numerically  equivalent  to 
a  simple  linear  transformat iot\  of  the  ratio  of  twvi  proportions.  In 
otlver  words,  wittv  this  index  we  are  comparitwi  tlve  ratio  of  tlve  propor¬ 
tion  of  blacks  wlvo  rev'elvo  a  qiven  act  ivin  to  the  proportivw  of  whites 
wlvo  receive  this  action,  .and  then  tr.ansforminq  this  quantity  linearly 
.so  it  will  have  .an  oriuln  of  *ero.  So  far  no  statistical  tests  have 
lieen  mavle  to  tost  whether  or  not  a  qiven  Representation  liuiex  is  siq- 
nific.antly  different  from  siero  or  whetlver  the  Representation  Invlex  fvir 
one  qrviup  (e.q.,  bl.acks)  is  siqni f icantly  different  fixsn  live  one  for 
.anotlver  qroup  (e.q.,  Rp.anlsh)  .  With  Army-wide  samples,  tlvese  te.ats  m.ay 
not  alw.ays  be  relev.ant;  iKiwever,  tests  of  tlvls  n.ature  .are  import  .ant 


.1  ...  I  actual  nunaier  „  | 

Representation  Index  -  ^  Tuimbi;i-  ' 

where  actual  number  equals  the  numlxvr  of  minorities  receivinq  a  pai- 
ticular  action,  and  expected  number  evpials  the  expected  proportion  if 
there  is  no  association  between  the  event  and  skin  color,  times  the  num¬ 
ber  of  indivivluals  receivinq  t)vc  particular  action. 
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when  representation  indexes  are  confuted  with  smaller  sample  sizes,  and 
the  researcher  wants  to  know  if  chance  variation  is  responsible  for  the 
size  of  the  indexes.  A  Pearson  chi-square  statistic  could  be  computed 
on  the  difference  between  proportions,  but  the  Representation  Index 
measures  a  ratio  rather  than  a  difference.  To  test  the  significance 
of  the  Representation  Index  (or  the  ratio)  directly,  a  logarithmic 
transformation  is  made  on  the  two  proportions,  and  then  the  test  statis¬ 
tic  is  computed  on  the  difference  between  the  logarithmically  transformed 
porportions,  which  is  in  fact  a  test  of  the  ratio  of  the  proportions. 
Tests  of  significance  for  the  Representation  Index  can  readily  be  made 
with  GENCAT.  These  tests  of  significance  can  also  be  made  using  a  cate¬ 
gorical  data  feature  of  the  RUMMAGE  progr^lm. 

As  this  example  demonstrates,  transformations  affect  the  nature  of 
the  hypothesis  that  is  tested.  Transformations  are  often  made  with 
analysis  of  variaiice  in  an  attempt  to  normalize  the  distribution  of 
scores.  It  should  be  noted  that  these  transformations  alter  the  nature 
of  the  hypothesis  that  is  tested  as  well  as  the  nature  of  the  distribu¬ 
tion  of  scores. 


Categorical  Data  Versus  An.ilysis  of  Variance 

There  are  several  advantages  to  analyzing  data  using  the  GENCAT 
rather  than  analysis  of  variance: 

1.  The  GBINCAT  approach  requires  the  researcher  to  make  fewer 
assumptions  about  the  nature  of  the  data. 

2.  Much  of  the  data  collected  by  psychologists  is  nominal  or 
ordinal,  often  not  very  reliable,  and  can  best  be  represented 
as  categorical  variables. 

3.  Results  can  be  expressed  as  percentages  and,  as  such,  are 
readily  interpretable  by  the  Army  and  other  nonresearchers. 

In  many  cases,  however,  results  may  not  differ  much  from  analogous 
analysis  of  variance  results.  Also,  the  documentation  for  the  program 
is  written  by  statisticians  writing  to  an  audience  well  versed  in  matrix 
manipulations.  The  way  in  which  transformation,  design,  and  contrast 
matrices  are  entered  into  GENCAT  is  not  at  all  obvious  to  nonstatisticians 
who  h.ive  not  worked  with  the  program. 


CONCLUSION 

Each  of  the  generalized  programs  mentioned  previously — MAD/RUMMAGE, 
MULTIVARIANCE,  GENCAT — are  large  programs  that  took  several  man-years 
to  write  (e.g.,  MAD  has  over  12,000  Fortran  commands).  Bugs  have  been 
eliminated  over  several  years'  experience  with  the  programs.  Together 
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these  programs  provide  povierful  tools  for  psychologists  doing  field 
rese^u:ch.  Psychologists  often  find  themselves  with  (a)  unequal  sample 
sizes  in  field  experiments  due  to  lack  of  control,  (b)  multiple  depen¬ 
dent  variables  as  part  of  an  evaluation  research  design,  and  (c)  large 
quantities  of  nominal  or  ordinal  categorical  data.  The  generalized 
software  packages  described  here  can  handle  many  of  the  analysis  re¬ 
quirements  for  the  types  of  data  listed  above. 
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